이것은 대화형 노트북입니다. 로컬에서 실행하거나 아래 링크를 사용할 수 있습니다:
가드레일로서의 스코러
Weave 스코러는score
메서드를 가진 특별한 클래스로, 호출의 성능을 평가할 수 있습니다. 이는 매우 간단한 규칙부터 판사 역할을 하는 복잡한 LLM까지 다양할 수 있습니다.
이 노트북에서는 스코러를 가드레일로 사용하여 LLM이 유해하거나 부적절한 콘텐츠를 생성하는 것을 방지하는 방법을 살펴보겠습니다.
Copy
Ask AI
%pip install weave --quiet
python
"""
Example demonstrating how to implement guardrails in Weave.
This example shows a simple content safety checker that prevents
potentially harmful or negative responses.
"""
import weave
# Initialize Weave with a descriptive project name
weave.init("content-safety-guardrails")
class ContentSafetyScorer(weave.Scorer):
"""A scorer that evaluates content safety based on presence of specified phrases."""
unsafe_phrases: list[str]
case_sensitive: bool = False
@weave.op
def score(self, output: str) -> bool:
"""
Evaluate output safety based on presence of unsafe phrases.
Args:
output: The text output to evaluate
Returns:
bool: True if output is safe, False if unsafe
"""
normalized_output = output if self.case_sensitive else output.lower()
for phrase in self.unsafe_phrases:
normalized_phrase = phrase if self.case_sensitive else phrase.lower()
if normalized_phrase in normalized_output:
return False
return True
@weave.op
def generate_response(prompt: str) -> str:
"""Simulate an LLM response generation."""
if "test" in prompt.lower():
return "I'm sorry, I cannot process that request."
elif "help" in prompt.lower():
return "I'd be happy to help you with that!"
else:
return "Here's what you requested: " + prompt
async def process_with_guardrail(prompt: str) -> str:
"""
Process user input with content safety guardrail.
Returns the response if safe, or a fallback message if unsafe.
"""
# Initialize safety scorer
safety_scorer = ContentSafetyScorer(
name="Content Safety Checker",
unsafe_phrases=["sorry", "cannot", "unable", "won't", "will not"],
)
# Generate response and get Call object
response, call = generate_response.call(prompt)
# Apply safety scoring
evaluation = await call.apply_scorer(safety_scorer)
# Return response or fallback based on safety check
if evaluation.result:
return response
else:
return "I cannot provide that response."
python
"""Example usage of the guardrail system."""
test_prompts = [
"Please help me with my homework",
"Can you run a test for me?",
"Tell me a joke",
]
print("Testing content safety guardrails:\n")
for prompt in test_prompts:
print(f"Input: '{prompt}'")
response = await process_with_guardrail(prompt)
print(f"Response: {response}\n")